The Power of Bayes in Industry

Your Business Model is Your Data Generating Process

Dante Gates, PyMCon Web Series 2023

About Me

Husband and father

Philly data scientist

Wrote a haiku once

About Me

dantegates.github.io github.com/dantegates

Outline

Getting Started

What is a DGP?


Disambiguating an overloaded term:

  • The True Data Generating Process
  • The True DGP as modeled

Modeling from first principles



  • The True Data Generating Process
  • The True DGP as modeled

}

Goal: Align these

Motivating Example

Stan Case Study: Golf putting

Golf puts, a first principles model


x n y
0 2 1443 1346
1 3 694 577
2 4 455 337
3 5 353 208
4 6 272 149

The true DGP


Photo Credit: insidescience.org

The DGP as Modeled


Image Source: Stan Development Team

First principles fit

Image Source: Stan Development Team

First principles fit

Summary
  • Began with basic assumptions
  • Sketched mental model
  • Let the math play out
  • Fit the model

Image Source: Stan Development Team

First principles fit

Modeled outcome

Will the ball go in the hole?

-not-

y=0/1?

Image Source: Stan Development Team

Credit building through E-Commerce

Sketching

Modeling loan maturation

with pm.Model():
    ...

    # Learned parameter of the mature default rate
    # as `t` -> inf., e.g. pm.Beta()
    D = ...
    
    ...

Modeling loan maturation

with pm.Model():
    ...

    D = ...
    # The mature liquidation rate as `t` -> inf.
    L = 1 - D
    
    ...

Modeling loan maturation

with pm.Model():
    ...

    D = ...
    L = 1 - D

    # The in-flight rates at time `t`
    D_t = D * some_cdf(t-T)   # some_cdf:    R⁺ -> [0, 1]
    L_t = L * another_cdf(t)  # another_cdf: R⁺ -> [0, 1]
    
    ...

Modeling loan maturation

with pm.Model():
    ...

    D = ...
    L = 1 - D

    D_t = D * some_cdf(t-T)
    L_t = L * another_cdf(t)
    # The percentage of loans in-flight at time t
    I_t = 1 - D_t - L_t
    
    ...

Modeling Loan Maturation

with pm.Model():
    ...

    # technically we need to do pt.Stack(...).T
    likelihood = pm.Multinomial(
        'likelihood',
        n=N,
        p=[
            D_t,
            L_t,
            I_t
        ],
        observed=[
            default_at_t,
            liquidation_at_t,
            inflight_at_t,
        ]
    )

Modeling Loan Maturation


...


likelihood = pm.Multinomial(
    'likelihood',
    n=N,
    p=[
        D_t,
        L_t,
        I_t
    ],
    observed=[
        default_at_t,
        liquidation_at_t,
        inflight_at_t,
    ]
)

Features of model

Relationships

  • \(P(L) = 1 - P(D)\)
  • \(P(I_{t})=1-P(L_{t})-P(D_{t})\)

Constraints

  • \(P(L_{0})=P(D_{t}\vert t<T)=0\)
  • \(P(I_{0})=1\)
  • \(\underset{t\to\infty}{P(I_{t})}=0\)

Modeling Loan Maturation


...


likelihood = pm.Multinomial(
    'likelihood',
    n=N,
    p=[
        D_t,
        L_t,
        I_t
    ],
    observed=[
        default_at_t,
        liquidation_at_t,
        inflight_at_t,
    ]
)

Initial Benefits

  • Free stuff
  • Model maps cleanly to the business model by design
  • Estimating 5 outcomes of interest
  • Robust to overfitting

Modeling Loan Maturation


...


likelihood = pm.Multinomial(
    'likelihood',
    n=N,
    p=[
        D_t,
        L_t,
        I_t
    ],
    observed=[
        default_at_t,
        liquidation_at_t,
        inflight_at_t,
    ]
)

Critiques

  • Models loans, not balances
  • Assumes all loans share same risk distribution
  • Assumes loans default independently
  • Does not account for growth
    • e.g. N changing over time
  • No features!?

Awesome 2.0

Modeling with recovery


with pm.Model as model:
    ...

    # R: learned parameter of recovery, e.g. pm.Beta()
    # D_T: number of loans that defaulted `t-T` days ago
    # PD_T: number of loans in a past-due state `t-T` days ago
    # N: total number of loans
    R = ...
    D_t = (D_T + (1-R) * PD_T) / N
    D = D_t / some_cdf(t-T)
    
    # ↓↓↓ everything else same as before ↓↓↓

    ...

Results

Discussion

Why Bayesian?

Priors! All Bayesian models are DGPs!

Why Bayesian?

Priors! All Bayesian models are DGPs!

Why PyMCon?


Why industry?


From Wikipedia,

Why should I care?

⚠️ Disclaimer

👋

Why should I care?


  • no data, no problem
  • more than what you pay for: a principled way to extend a model

Humanz win


🧍 1
🤖 0

Perpay is Hiring

Appendix

Tips

Resources